Reference (computer Science)
   HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
, a reference is a value that enables a program to indirectly access a particular
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
, such as a
variable Variable may refer to: * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed * Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
's value or a record, in the
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
's
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called
dereferencing In computer programming, the dereference operator or indirection operator, sometimes denoted by "*" (i.e. an asterisk), is a unary operator (i.e. one with a single operand) found in List of C-family programming languages, C-like languages that in ...
the reference. A reference is distinct from the datum itself. A reference is an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
and may be implemented in many ways. Typically, a reference refers to data stored in memory on a given system, and its internal value is the
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
of the data, i.e. a reference is implemented as a pointer. For this reason a reference is often said to "point to" the data. Other implementations include an offset (difference) between the datum's address and some fixed "base" address, an
index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...
,
unique key In relational database management systems, a unique key is a candidate key that is not the primary key of the relation. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the prim ...
, or
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
used in a
lookup In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value v wi ...
operation into an
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
or
table Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
, an operating system
handle A handle is a part of, or attachment to, an object that allows it to be grasped and manipulated by hand. The design of each type of handle involves substantial ergonomic issues, even where these are dealt with intuitively or by following tra ...
, a
physical address In computing, a physical address (also real address, or binary address), is a memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a ''particular'' storage cell ...
on a storage device, or a network address such as a URL.


Formal representation

A reference ''R'' is a value that admits one operation, dereference(''R''), which yields a value. Usually the reference is typed so that it returns values of a specific type, e.g.: interface Reference Often the reference also admits an assignment operation store(''R'', ''x''), meaning it is an abstract variable.


Use

References are widely used in programming, especially to efficiently pass large or mutable data as
arguments An argument is a statement or group of statements called premises intended to determine the degree of truth or acceptability of another statement called conclusion. Arguments can be studied from three main perspectives: the logical, the dialectic ...
to
procedures Procedure may refer to: * Medical procedure * Instructions or recipes, a set of commands that show how to achieve some result, such as to prepare or make something * Procedure (business), specifying parts of a business process * Standard opera ...
, or to share such data among various uses. In particular, a reference may point to a variable or record that contains references to other data. This idea is the basis of
indirect addressing Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...
and of many
linked data structure In computer science, a linked data structure is a data structure which consists of a set of data records (''nodes'') linked together and organized by references (''links'' or '' pointers''). The link between data can also be called a connector. In ...
s, such as
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
s. References increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of code. As long as one can access a reference to the data, one can access the data through it, and the data itself need not be moved. They also make sharing of data between different code areas easier; each keeps a reference to it. References can cause significant complexity in a program, partially due to the possibility of dangling and
wild reference Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are ...
s and partially because the
topology In mathematics, topology (from the Greek language, Greek words , and ) is concerned with the properties of a mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformations, such ...
of data with references is a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
, whose analysis can be quite complicated. Nonetheless, references are still simpler to analyze than pointers due to the absence of
pointer arithmetic In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer ''ref ...
. The mechanism of references, if varying in implementation, is a fundamental programming language feature common to nearly all modern programming languages. Even some languages that support no direct use of references have some internal or implicit use. For example, the
call by reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
calling convention can be implemented with either explicit or implicit use of references.


Examples

Pointers are the most primitive type of reference. Due to their intimate relationship with the underlying hardware, they are one of the most powerful and efficient types of references. However, also due to this relationship, pointers require a strong understanding by the programmer of the details of memory architecture. Because pointers store a memory location's address, instead of a value directly, inappropriate use of pointers can lead to
undefined behavior In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior ...
in a program, particularly due to
dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references ar ...
s or
wild pointer Wild, wild, wilds or wild may refer to: Common meanings * Wild animal * Wilderness, a wild natural environment * Wildness, the quality of being wild or untamed Art, media and entertainment Film and television * ''Wild'' (2014 film), a 2014 A ...
s.
Smart pointer In computer science, a smart pointer is an abstract data type that simulates a pointer while providing added features, such as automatic memory management or bounds checking. Such features are intended to reduce bugs caused by the misuse of poin ...
s are opaque data structures that act like pointers but can only be accessed through particular methods. A
handle A handle is a part of, or attachment to, an object that allows it to be grasped and manipulated by hand. The design of each type of handle involves substantial ergonomic issues, even where these are dealt with intuitively or by following tra ...
is an abstract reference, and may be represented in various ways. A common example are
file handle In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier ( handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically hav ...
s (the FILE data structure in the C standard I/O library), used to abstract file content. It usually represents both the file itself, as when requesting a
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
on the file, and a specific position within the file's content, as when reading a file. In
distributed computing A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
, the reference may contain more than an address or identifier; it may also include an embedded specification of the network protocols used to locate and access the referenced object, the way information is encoded or serialized. Thus, for example, a
WSDL The Web Services Description Language (WSDL ) is an XML-based interface description language that is used for describing the functionality offered by a web service. The acronym is also used for any specific WSDL description of a web service (also ...
description of a remote web service can be viewed as a form of reference; it includes a complete specification of how to locate and bind to a particular web service. A reference to a
live distributed object Live distributed object (also abbreviated as ''live object'') refers to a running instance of a distributed multi-party (or peer-to-peer) protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encaps ...
is another example: it is a complete specification for how to construct a small software component called a ''proxy'' that will subsequently engage in a peer-to-peer interaction, and through which the local machine may gain access to data that is replicated or exists only as a weakly consistent message stream. In all these cases, the reference includes the full set of instructions, or a recipe, for how to access the data; in this sense, it serves the same purpose as an identifier or address in memory. If we have a set of keys ''K'' and a set of data objects ''D'', any well-defined (single-valued) function from ''K'' to ''D'' ∪ defines a type of reference, where ''null'' is the image of a key not referring to anything meaningful. An alternative representation of such a function is a directed graph called a reachability graph. Here, each datum is represented by a vertex and there is an edge from ''u'' to ''v'' if the datum in ''u'' refers to the datum in ''v''. The maximum
out-degree In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
is one. These graphs are valuable in
garbage collection Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclabl ...
, where they can be used to separate accessible from inaccessible objects.


External and internal storage

In many data structures, large, complex objects are composed of smaller objects. These objects are typically stored in one of two ways: # With internal storage, the contents of the smaller object are stored inside the larger object. # With external storage, the smaller objects are allocated in their own location, and the larger object only stores references to them. Internal storage is usually more efficient, because there is a space cost for the references and dynamic allocation metadata, and a time cost associated with dereferencing a reference and with allocating the memory for the smaller objects. Internal storage also enhances
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
by keeping different parts of the same large object close together in memory. However, there are a variety of situations in which external storage is preferred: * If the data structure is recursive, meaning it may contain itself. This cannot be represented in the internal way. * If the larger object is being stored in an area with limited space, such as the stack, then we can prevent running out of storage by storing large component objects in another memory region and referring to them using references. * If the smaller objects may vary in size, it is often inconvenient or expensive to resize the larger object so that it can still contain them. * References are often easier to work with and adapt better to new requirements. Some languages, such as
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,
Smalltalk Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by Alan Ka ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, and
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
, do not support internal storage. In these languages, all objects are uniformly accessed through references.


Language support


Assembly

In
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
, it is typical to express references using either raw memory addresses or indexes into tables. These work, but are somewhat tricky to use, because an address tells you nothing about the value it points to, not even how large it is or how to interpret it; such information is encoded in the program logic. The result is that misinterpretations can occur in incorrect programs, causing bewildering errors.


Lisp

One of the earliest opaque references was that of the
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
language
cons cell In computer programming, ( or ) is a fundamental function in most dialects of the Lisp programming language. ''constructs'' memory objects which hold two values or pointers to two values. These objects are referred to as (cons) cells, conses, ...
, which is simply a record containing two references to other Lisp objects, including possibly other cons cells. This simple structure is most commonly used to build singly
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
s, but can also be used to build simple
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
s and so-called "dotted lists", which terminate not with a null reference but a value.


C/C++

The pointer is still one of the most popular types of references today. It is similar to the assembly representation of a raw address, except that it carries a static datatype which can be used at compile-time to ensure that the data it refers to is not misinterpreted. However, because C has a weak type system which can be violated using casts (explicit conversions between various pointer types and between pointer types and integers), misinterpretation is still possible, if more difficult. Its successor
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
tried to increase
type safety In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
of pointers with new cast operators, a reference type &, and smart pointers in its standard library, but still retained the ability to circumvent these safety mechanisms for compatibility.


Fortran

Fortran does not have an explicit representation of references, but does use them implicitly in its
call-by-reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
calling semantics. A Fortran reference is best thought of as an ''alias'' of another object, such as a scalar variable or a row or column of an array. There is no syntax to dereference the reference or manipulate the contents of the referent directly. Fortran references can be null. As in other languages, these references facilitate the processing of dynamic structures, such as linked lists, queues, and trees.


Object-oriented languages

A number of object-oriented languages languages such as
Eiffel Eiffel may refer to: Places * Eiffel Peak, a summit in Alberta, Canada * Champ de Mars – Tour Eiffel station, Paris, France; a transit station Structures * Eiffel Tower, in Paris, France, designed by Gustave Eiffel * Eiffel Bridge, Ungheni, M ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, C#, and
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (cl ...
have adopted a much more opaque type of reference, usually referred to as simply a ''reference''. These references have types like C pointers indicating how to interpret the data they reference, but they are typesafe in that they cannot be interpreted as a raw address and unsafe conversions are not permitted. References are extensively used to access and assign objects. References are also used in function/
method Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to: *Scien ...
calls or message passing, and reference counts are frequently used to perform
garbage collection Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclabl ...
of unused objects.


Functional languages

In
Standard ML Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of the ...
,
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
, and many other functional languages, most values are persistent: they cannot be modified by assignment. Assignable "reference cells" provide mutable variables, data that can be modified. Such reference cells can hold any value, and so are given the polymorphic type α ref, where α is to be replaced with the type of value pointed to. These mutable references can be pointed to different objects over their lifetime. For example, this permits building of circular data structures. The reference cell is functionally equivalent to a mutable array of length 1. To preserve safety and efficient implementations, references cannot be type-cast in ML, nor can pointer arithmetic be performed. It is important to note that in the functional paradigm, many structures that would be represented using pointers in a language like C are represented using other facilities, such as the powerful algebraic datatype mechanism. The programmer is then able to enjoy certain properties (such as the guarantee of immutability) while programming, even though the compiler often uses machine pointers "under the hood".


Perl/PHP

Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
supports hard references, which function similarly to those in other languages, and symbolic references, which are just string values that contain the names of variables. When a value that is not a hard reference is dereferenced, Perl considers it to be a symbolic reference and gives the variable with the name given by the value.
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
has a similar feature in the form of its $$var syntax.


See also

*
Reference type In computer programming, data types can be divided into two categories: value types (or by-value types) and reference types (or by-reference types). Value types are completely represented by their meaning, while reference types are references to ano ...
* Abstraction (computer science) * Autovivification *
Bounded pointer In computer science, a bounded pointer is a pointer (computer programming), pointer that is augmented with additional information that enable the storage bounds within which it may point to be deduced. This additional information sometimes takes th ...
*
Linked data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
*
Magic cookie In computing, a magic cookie, or just cookie for short, is a token or short packet of data passed between communicating programs. The cookie is often used to identify a particular event or as "handle, transaction ID, or other token of agreement be ...
*
Variable (programming) In computer programming, a variable is an abstract storage location paired with an associated symbolic name, which contains some known or unknown quantity of information referred to as a ''value''; or in simpler terms, a variable is a named cont ...
*
Weak reference In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced ''only'' by weak references – meaning "every chain of ref ...


References


External links


Pointer Fun With Binky
Introduction to pointers in a 3-minute educational video – Stanford Computer Science Education Library {{Web syndication Data types Programming language concepts Primitive types